Automatic peak identification method

ABSTRACT

The invention described herein details a protocol to improve analysis and peak identification in spectroscopic data. Bayesian methods are used to automatically identify peaks in data sets. After identifying peak shapes, the method tests the hypothesis that a given number of peaks is found within any given data window. If a peak is identified within a given window, then the likelihood function is maximized in order to estimate peak position and amplitude. This process yields a spectrum with high resolution and minimal artifacts. The method described herein is particularly useful for identifying peaks in data sets obtained from spectroscopy.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority from U.S. ProvisionalApplication Ser. No. 60/664,010, filed Mar. 22, 2005.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No. 5R44CA101479-03 awarded by NIH. The government has certain rights in theinvention.

BACKGROUND OF THE INVENTION

The invention is related to the field of analyzing data which includesmultiple peak data and, in particular, pertains to an automatic methodfor analyzing data which includes multiple peak data resulting in highresolution estimates.

Work in chromatography, spectroscopy, engineering, pharmacology, andother fields frequently requires analysis of data sets exhibitingmultiple peaks. Analysis of multi-peak data is particularly difficultwhen peaks overlap, or when data are noisy. Peak finding algorithms ofmany types are used to identify and interpret peaks and peak positions.

Present state-of-the-art peak finders are either based upon 1] simplematched filters, or wavelets, or searches for local maxima in the timeseries (resulting in less-than-optimal resolution), 2] parameter fittingto peaks using optimization methods, sometimes on the entire time series(such methods are slow), or 3] direct methods which involve significantoperator interaction (i.e. by placing cursors over peaks on a computerscreen, estimating baselines by eye, etc.). All of these methods havedisadvantages. The method described herein runs essentially like afilter, results in much higher resolution estimates of peak positionsand amplitudes than the existing state-of-the-art, and is almostcompletely automatic.

SUMMARY OF THE INVENTION

The invention described herein details a protocol to improve analysisand peak identification in spectroscopic data. Bayesian methods are usedto automatically identify peaks in data sets. After identifying peakshapes, the method tests the hypothesis that a given number of peaks isfound within any given data window. If a peak is identified within agiven window, then the likelihood function is maximized in order toestimate peak position and amplitude.

This process yields a spectrum with high resolution and minimalartifacts. Furthermore, the process is not overly computationalintensive, with low delay times required. A method that simply maximizesthe likelihood function is likely to generate artifacts, while a methodthat only tests the hypothesis that a peak is present, without thesubsequent maximization of the likelihood function, will yield aspectrum with decreased resolution.

The peak identification method described herein is potentially useful inany data set wherein peaks of interest are distinguished. The methoddescribed herein is useful for identifying peaks in data sets obtainedfrom spectroscopy, including but not limited to various forms ofabsorption and emission spectroscopy, including absorption spectroscopy,UV/VIS spectroscopy, IR spectroscopy, X-ray spectroscopy, X-raycrystallography, Nuclear Magnetic Resonance spectroscopy (NMR), andraman spectroscopy. In particular, the methods described herein areuseful for applications in mass spectroscopy, including TOF-SIMS andSELDI-MS. For example, the methods described herein are useful foranalyzing TOF-SIMS mass spectroscopic data related to proteomics. Themethods described herein are not limited to spectroscopic applications;for example, the methods are particularly useful when used for imagingapplications.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a sample spectrum, in which the number of ions is plottedversus the time required for the ions to reach the detector.

FIG. 2 shows four panels from a TOF-SIMS spectrum, with each successivepanel corresponding to successive manipulations of the data.

FIG. 3 depicts a small portion of a TOF/SIMS spectrum of angiotensin.

DETAILED DESCRIPTION OF THE INVENTION

The following symbols are accorded the following meanings:

-   N total number of data points in the window-   s_(j)=s(t_(j)) j=1 . . . N observed signal prior to any processing-   x_(j)=x(t_(j)) ideal wavelet shape-   η_(j)=η(t_(j)) noise-   σ_(η) ²=    ηη^(T)    noise co-variance

Herein we describe the use of Bayesian methods to automatically identifypeaks in data sets. We start with a brief overview of the logic. Thefirst step is to compare the hypotheses that there are a given number ofpeaks (i.e., 0, 1, 2, . . . n) in the observation window. If a peak ispresent, we then estimate its position and amplitude via parameterfitting.

After this summary of the logic, we develop algebras for two simplecases where Gaussian or Poisson noise is involved. Both of these casesare simple enough that all results can be arrived at in closed form. Wefirst present how to derive a matched filter for a known peak shape thatis embedded in additive white (Gaussian) noise of unknown amplitude. Wethen show the similar procedure can be applied to Poisson noised data.Some preliminary results on SELDI-MS (Surface Enhanced Laser DesorptionIonization Mass Spectrometer) and TOF-SIMS (Time of Flight SecondaryIonization Mass Spectrometer) data are shown. It is easy to add color(finite time correlations) to the noise. Once these simple situationsare understood, extending to other types of noise, such combinations ofPoisson, Gaussian, and coherent ‘noise’ processes, is logicallystraightforward, though algebraically more involved. All of the methodsdescribed herein require knowledge of the ‘target’ peak shape.

FIG. 1 shows a sample mass spectrum, in which the number of ions is wasplotted versus the time that ions take to fly to detector. Ideally, ionsof a specific m/z would exhibit the same time of flight, resulting insharp peak line shape like a delta function. However, in reality, theyenter the drift region with velocity and time distributions of finitewidth, which results in a finite peak width for ions of a specific m/z.

1.1 Description of the Underlying Assumptions

1.1.1 The Relationship Between Peak Signal and Probability Density

In counting experiments, ideally only one ion of a given mass arrives,or at most a few ions of the same mass arrive, at the detector during agiven temporal sampling interval and are counted individually. Themeasurement is repeated many times under nominally identical conditionsand one gradually builds up a portrait of the probability distributionof arrival times for each ion mass. This means that the final timeseries s(t) that is to be analyzed after all the data has been gathered,is proportional to the conditional probability distributions(t _(k))Δt∝p(t _(k)|ƒ₀(ν*(m),t*))Δt,  (1)wherein s(t)Δt is the signal amplitude at time t_(k), Δt is the samplinginterval, and p(t_(k)|ƒ₀(ν*(m),t*)) is the probability that an ion ofmass m will strike the detector between t_(k) and t_(k)+Δt given that itentered the drift region at time t* with a velocity close to ν*(m). Ifonly one ion species is present, the initial velocity distribution,ƒ₀(v*(m),t*) is just a sharply peaked function of ν of the form

$\begin{matrix}{{{f_{0}\left( {{v_{*}(m)},t_{*}} \right)} = {g\left( \frac{v - {v_{*}(m)}}{\sigma(m)} \right)}},{{\int{\mathbb{d}{{sg}(s)}}} = 1.}} & (2)\end{matrix}$If a variety of different ions is present, ƒ₀ will be a superposition ofterms like (2), properly normalized so that ƒ₀ will have unit totalarea. This assumes that the shape of the velocity spread is universal,up to translation and rescaling. We will assume ν*(m) and σ(m) to beweak functions of the mass, and to scale like 1/√{square root over (m)}.Fixing their functional form is part of the calibration process. Inpractice, the velocity distributions for each peak will containinformation about the dynamics of the ion formation process, whichdepend upon the ion species. However, if we assume that the ion opticsare designed so that it will only pass a narrow range of most possiblevelocities for any given mass, then maximizing the entropy of theprobability distribution ƒ₀ for a single peak, given v* and σ, leads tothe choice of a Gaussian for g:

$\begin{matrix}{{g\left( \frac{v - v_{*}}{\sigma} \right)} = {\frac{1}{\sqrt{2\pi}\sigma}{\exp\left( {- \frac{\left( {v - v_{*}} \right)^{2}}{2\sigma^{2}}} \right)}}} & (3)\end{matrix}$The choice of the maximum-entropy distribution is founded upon theprinciple that it maximizes the number of possible outcomes of repeatedobservations that are consistent with this assignment of theprobability. Hence, it is the least biasing assignment of theprobability that is consistent with our limited knowledge of the initialdistribution. Transforming from the initial velocity distribution g tothe temporal peak shape requires solution of a simple Fokker-Planckequation, and is described later.

1.1.2 The Assumption of Independence

The observed peaks for a portion of a spectrum show in FIG. 1 havefinite width in time. One of the causes is the finite width of thevelocity distribution upon entering free flight. However, it isimportant to realize that ions that arrive between time t_(k) andt_(k)+Δt are independent of those that arrive at any other time. This isbecause, as stated in the previous section, each event/measurement onlyresults in one or at most a few ions of a given m/z. The resultingspectrum is an accumulation of many measurements under the sameexperiment condition. Therefore, the joint probability of arrival of twoions at two different times satisfiesp(t _(k) ,t _(j)|ƒ₀)=p(t _(k)|ƒ₀)p(t _(j)|ƒ₀), j≠k.  (4)even for counts associated with the same mass peak. Any correlations inthe signal are assumed to be due to the electronics. If suchcorrelations are present (as they are in MALDI and SELDI spectra) thenthis must be included in the analysis as part of the model used, as willbe discussed later. The assumption of independence, however, underliesthe entire analysis that we pursue below in the first few examples.

Because the measurement is repeated many times under ‘identical’conditions, producing the same ƒ₀ at each repetition (with the ‘clocktime’ t reset to zero at the start of each measurement), if a totalnumber of n ions are counted over the entire course of the measurement,then the total number of counts between t_(k) and t_(k)+Δt is verynearlyn _(k) ≈p(t _(k)|ƒ₀) n   (5)

1.2 Model Comparisons (i.e. Counting Peaks in the Window, ComparingNoise Models)

We now consider the peak of a particular of mass m. We introduce awindow that includes only N points in the time series. The width of thewindow is chosen to be a slowly varying function of the mass so that ittypically will run only from the half-maxima to the left and right of a‘typical’ peak of mass m. The change in window width is due to thenature of time-of-flight apparatus where heavier ions will arrive laterin time and show up with wider peak shape. This is a rough-and-readycompromise between the desire to include as much data as possible in thewindow to improve the sampling statistics, and the realization thatpeaks will overlap and that our peak shape model is probably not verygood out on the tails of the peak.

Focus attention now on a particular window position. We label the windowlocation, t₀ in such a way that if a mass peak appeared in the windowwith zero width then the likelihood function (defined momentarily) willbe maximized at t₀. Because of the mass-dependent asymmetry in peak lineshape, t₀ is certainly not at the leading or trailing edge of thewindow. It is also not located at the center of the window, nor at themaximum of the ‘target’ temporal peak shape, but lies somewhere betweenthese two points at a position which is weakly dependent upon m. In thisway, we correct for systematic errors in the estimation of the peakshape due to its mass-dependent asymmetry.

We define the target shape of the temporal peak to be x(t−t₀) which isthen sampled at discrete time intervals, giving the N-component vectorx=(x ₁ , . . . x _(N))≡(x(t ₁ −t ₀), . . . x(t _(N) −t ₀))  (6)This ‘slides’ with the window and for convenience is normalized to haveunit area:

$\begin{matrix}{{\sum\limits_{k = 1}^{N}x_{k}} = 1} & (7)\end{matrix}$This will only introduce a constant correction for the peak amplitudecomputed later. We now assume that the observed signal s(t) within thewindow is given bys _(k) =αx _(k)+ξ_(k) , k=1,2, . . . N  (8)where a is an unknown amplitude and ξ=(ξ₁,ξ₂, . . . ξ_(N)) is a randomprocess of some appropriate type. If more than one peak is present inthe window, this complicates the analysis because more parameters mustbe fit, but the logic is similar to what we describe below. For countingexperiments, we do not observe the signal s(t) directly, but insteadcount how many times it crosses some detection threshold in a timeinterval Δt. This is because a single ion striking a micro-channelplate, or a channeltron, will initiate a cascade of electron emissionthat amplifies the original micro-pulse into one that is macroscopicallydetectable. The noise process will be Poisson with some appropriatelychosen rate, while for analog experiments it is probably a combinationof Poisson and Gaussian processes. For the moment, we leave the choiceof noise model aside and simply assert that choosing a noise model ispart of the model selection process, and choosing between them is partof the hypothesis testing we now describe.

Under the assumption that the ion counts at different times areindependent, the probability of observing the particular count sequencen=(n₁,n₂,K n_(N)) is simply

$\begin{matrix}{\begin{matrix}{{p\left( {{n\text{❘}a},\lambda,t_{0},M} \right)} = {p\left( {{\left\{ {n_{1},n_{2},{K\mspace{11mu} n_{N}}} \right\}\text{❘}a},\lambda,t_{0},M} \right)}} \\{= {\prod\limits_{k = 1}^{N}{p\left( {{n_{k}\text{❘}a},\lambda,t_{0},M} \right)}}}\end{matrix}\;} & (9)\end{matrix}$By the notation λ we indicate a parameter that characterizes the noiseprocess (e.g. the variance σ for a Gaussian process or the ‘dark’ rater₀ for a Poisson process). By M we mean to emphasize a particular choiceof model class, including a peak shape x(t) and a noise model. Theprobability (9) is the likelihood function of observing the data in thewindow labeled t₀, given the model class M and the parameters.

We first want to decide whether there is a peak in the window or not.This is a comparison between two hypotheses:

-   -   H₀=There is no peak in the widow t₀. The data are noise of the        assumed type. Let's call the associated pure-noise model M₀.    -   H₁=There is a single peak in the widow t₀ with the shape x.        Deviations from this shape in the data are due to noise of the        assumed type. Let's call the associated peak-plus-noise model        M₁.        We wish to compute the probability of which each of these        hypotheses was true, given the data, and then take the ratio

$\begin{matrix}{\frac{p\left( {H_{1}\text{❘}n} \right)}{p\left( {H_{0}\text{❘}n} \right)} = \frac{p\left( {{M_{1}\text{❘}n},t_{0}} \right)}{p\left( {{M_{0}\text{❘}n},t_{0}} \right)}} & (10)\end{matrix}$If this ratio is large compared to one, we can be confident that thereis probably a peak in the window, while if it is approximately one weinterpret that as saying there is only weak evidence of a peak in thewindow (because α=0 is a possible estimate of the peak amplitude, whichwe interpret as ‘no peak’). Therefore, one may set an appreciablethreshold for peak detection. In order to derive (10) from (9), we firstinvoke Bayes' theorem which states:

$\begin{matrix}{{{p\left( {A\text{❘}B} \right)}{p(B)}} = {\left. {{p\left( {B\text{❘}A} \right)}{p(A)}}\Rightarrow{p\left( {A\text{❘}B} \right)} \right. = \frac{{p\left( {B\text{❘}A} \right)}{p(A)}}{p(B)}}} & (11)\end{matrix}$Therefore, identifying A as the model class M_(k), and B as the observeddata in the time window t₀:

$\begin{matrix}{{p\left( {{M_{k}\text{❘}n},t_{0}} \right)} = \frac{{p\left( {{n\text{❘}M_{k}},t_{0}} \right)}{p\left( {M_{k}\text{❘}t_{0}} \right)}}{p\left( {n\text{❘}t_{0}} \right)}} & (12)\end{matrix}$Since we are given that we observed the data n in the window t₀, thedenominator is unity. If we have no reason to prefer one model classover another, we should assign them all equal prior probabilities. Forexample, if we are comparing two types of models (a single peak vs. nopeak), then p(M₀|t₀)=p(M₁|t₀)=½. Therefore, we have the simple resultthat

$\begin{matrix}{{p\left( {{M_{k}\text{❘}n},t_{0}} \right)} = {\frac{1}{2}{p\left( {{n\text{❘}M_{k}},t_{0}} \right)}}} & (13)\end{matrix}$Hence, we need to calculate the likelihood that we observed the datagiven the model class, p(n|M_(k),t₀), a quantity called the evidence. Weget this by integrating the likelihood function (9) over the modelparameters using an appropriate prior for the parameters:p(n|M _(k) ,t ₀)=∫dadλp(n|α,λ,M _(k) ,t ₀)p(α,λ|M _(k) ,t ₀)  (14)where p(α,λ|M_(k),t₀) is prior distributions of parameters (α,λ). Foreach model class, there will be a prior probability distribution for theparameters. For example, when no peak is present we choose as the prior:

$\begin{matrix}{{p\left( {a,{\lambda\text{❘}t_{0}},M_{0}} \right)} = {\frac{1}{2}{\delta(a)}{p\left( {{\lambda\text{❘}t_{0}},M_{0}} \right)}}} & (15)\end{matrix}$where the ½ factor appears because the δ-function will be integratedonly over the positive values of α. If we know nothing about the valuesof λ, then we choose a uniform prior, or some other prior that is verybroad in λ-space on the grounds that when we integrate against (9) onlythe neighborhood of the maximum likelihood value of λ will contribute.

The likelihood of the data given the model (14) will be a bit lessmysterious when we calculate for specific models momentarily. Beforedoing so, we summarize that to compare the hypotheses that there is, oris not, a peak in the window, we need to compute the ratio (10), whichrequires computation of (14) for each model. Setting the threshold forthe level of evidence we require for ‘peak detection’ is a separateissue. Given that we have detected that a peak lies within a certainregion of the time axis, we then fix the position and amplitude of thepeak by maximizing the likelihood over the parameters (α,λ,t₀), asdiscussed in next section. Notice, however, that maximizing over t₀ hasa different logical character than the other parameters, because we arecomparing different data sets as we slide the window across the peak.The justification is based upon the physical reasonableness of theapproach: the width of the window is large compared to the uncertaintyin the position of the peak, hence near the maximum of the likelihood,most of the data being compared comes from overlapping windows.

1.3 Parameter Fitting

As the window sliding across the spectrum one point at a time, the ratio(10) is calculated for each window position. We can then justify inwhich region in the spectrum we are confident that there is a peak. Wethen look into particular regions of interest, fit parameters (α, λ) bymaximizing equation (9), i.e. solve the following equations, for eachwindow t₀:

$\begin{matrix}{{\frac{\partial L}{\partial a} = 0}{\frac{\partial L}{\partial\lambda} = 0}} & (16)\end{matrix}$where L(t₀), the natural logarithm of likelihood function (9):L(t ₀)=1n(p(n|α,λ,M _(k) ,t ₀)  (17)

Solving equations (16) gives the maximum likelihood estimations ofparameters (α*,λ*). If the data is very informative, then the likelihoodwould sharply peak around the point (α*,λ*) in the parameter space. Itis natural to Taylor expand (17) around (α*,λ*):

$\begin{matrix}{\begin{matrix}{L \approx {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\begin{bmatrix}\left. \frac{\mspace{11mu}{\partial^{2}\; L}}{\partial\mspace{11mu} a^{\; 2}} \middle| {}_{\mspace{11mu}{\mspace{11mu}{a^{*},\mspace{14mu}\lambda^{*}}}}{\left( {a\mspace{11mu} - \mspace{14mu} a^{*}} \right)^{2} + \frac{\mspace{11mu}{\partial^{2}\; L}}{{\partial a}\;{\partial\lambda}}} \right|_{\mspace{11mu}{\mspace{11mu}{a^{*},\mspace{14mu}\lambda^{*}}}} \\\left. {{\left( {a - a^{*}} \right)\left( {\lambda - \lambda^{*}} \right)} + \frac{\mspace{11mu}{\partial^{2}\; L}}{\partial\mspace{11mu}\lambda^{\; 2}}} \middle| {}_{\mspace{11mu}{a^{*},\mspace{14mu}\lambda^{\;*}}}\left( {\lambda - \lambda^{*}} \right)^{2} \right.\end{bmatrix}}}} \\{= {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\left( {X - X^{*}} \right)^{\prime}{\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}}\left( {X - X^{*}} \right)}}}\end{matrix}{{{{where}\mspace{20mu} X} = \begin{bmatrix}a \\\lambda\end{bmatrix}},\mspace{20mu}{and}}} & (18) \\{\nabla{\nabla{L\begin{bmatrix}\frac{\partial^{2}L}{\partial a^{2}} & \frac{\partial^{2}L}{{\partial a}{\partial\lambda}} \\\frac{\partial^{2}L}{{\partial a}{\partial\lambda}} & \frac{\partial^{2}L}{\partial\lambda^{2}}\end{bmatrix}}}} & (19)\end{matrix}$is the Hessian matrix. It follows from (18) that the likelihood function(9) is approximately:

$\begin{matrix}\begin{matrix}{{p\left( {\left. n \middle| a \right.,\lambda,M,t_{0}} \right)} = {\exp\left\lbrack {L\left( t_{0} \right)} \right\rbrack}} \\{\approx {\exp\left( {{L\left( {a^{*},\lambda^{*}} \right)} + {\frac{1}{2}\left( {X - X^{*}} \right)^{\prime}{\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}}\left( {X - X^{*}} \right)}} \right)}} \\{= {{\mathbb{e}}^{L{({a^{*},\lambda^{*}})}}{\mathbb{e}}^{\frac{1}{2}{({X - X^{*}})}^{\prime}{\nabla{\nabla{L{({a^{*},\lambda^{*}})}}}}{({X - X^{*}})}}}}\end{matrix} & (20)\end{matrix}$This implies that the likelihood function (9) looks like a multivariatenormal distribution in parameter space, with uncertainties:

$\begin{matrix}{{\sigma_{a} = \left( \left. {- \frac{\partial^{2}L}{\partial a^{2}}} \right|_{a^{*},\lambda^{*}} \right)^{{- 1}/2}}{\sigma_{\lambda} = \left( \left. {- \frac{\partial^{2}L}{\partial\lambda^{2}}} \right|_{a^{*},\lambda^{*}} \right)^{{- 1}/2}}} & (21)\end{matrix}$

Moreover, when computing the evidence, i.e. marginalizing the likelihoodfunction (9) over the prior p(α,λ|M_(k),t₀), as seen in equation (14),if the prior is independent of (α, λ), for example, α and λ areuniformly distributed in some region (α_(min),α_(max)) and(λ_(min),λ_(max)), substituting equation (20) into equation (14) resultsin the following:

$\begin{matrix}\begin{matrix}{{p\left( {\left. n \middle| M_{\; k} \right.,t_{\; 0}} \right)} = {\int{{\mathbb{d}a}{\mathbb{d}\lambda}\;{p\left( {\left. n \middle| a \right.,\lambda,M_{\; k},t_{\; 0}} \right)}{p\left( {a,\left. \lambda \middle| M_{\; k} \right.,t_{\; 0}} \right)}}}} \\{= {\int_{\; a_{\;\min}}^{\; a_{\;\max}}{\int_{\;\lambda_{\;\min}}^{\;\lambda_{\;\max}}\ {{\mathbb{d}a}\ {\mathbb{d}{\lambda\mathbb{e}}^{\;{L{(\;{a^{*},\;\lambda^{*}})}}}}}}}} \\{{\mathbb{e}}^{\mspace{11mu}{\frac{1}{\; 2}\mspace{11mu}{({X\mspace{11mu} - \mspace{14mu} X^{*}})}^{\prime}\;{\nabla{\nabla L}}\;{(\mspace{11mu}{a^{*},\mspace{14mu}\lambda^{*}})}\;{({X\mspace{11mu} - \mspace{14mu} X^{*}})}}}} \\{\frac{1}{a_{\max} - a_{\min}}\frac{1}{\lambda_{\max} - \lambda_{\min}}} \\{= {\frac{1}{a_{\max} - a_{\min}}\frac{1}{\lambda_{\max} - \lambda_{\min}}{\mathbb{e}}^{L{({a^{*},\lambda^{*}})}}\frac{\left( {2\pi} \right)^{m/2}}{\sqrt{\det\left\lbrack {\nabla{\nabla{L\left( {a^{*},\lambda^{*}} \right)}}} \right\rbrack}}}}\end{matrix} & (22)\end{matrix}$where m in equation (22) is the dimension of parameter space. In thelast step of the integration, the lower and upper boundaries ofintegration are extended to infinity. This is doable on the basis that(α_(min),α_(max)) and (λ_(min),λ_(max)) are large enough such thatcontributions from outside these regions are negligible. Notedet[∇∇L(α*,λ*)] is the determinant of the Hessian matrix evaluated at(α*,λ*), and 1/det[∇∇L(α*,λ*)] is proportional to the volume withinσ_(α) and σ_(λ) around (α*,λ*) in parameter space.

2. Finding a Peak in a Time-of-flight Mass Spectrum.

We now consider some specific examples, following the discussion above.The first concerns the detection of a peak in white Gaussian noise. Weshow that the approach described above leads to what is called the‘matched filter’ in the engineering literature. After that, we move onthe applications in ‘counting’ problems, where the noise is due to thediscreteness of the underlying process and a Poisson noise model is moreappropriate. The later case is of particular interest as it is relatedto TOF-SIMS (Time of Flight Secondary Ionization Mass Spectrometer). TheGaussian noise case is also important as it may relate to other massspectrometer, such as SELDI (Surface Enhanced Laser DesorptionIonization). In this section, we do not specify the peak shape for bothGaussian noise case and Poisson noise case, we simply use x_(i) todenote peak shape in general. In section 3, before we present somepreliminary results, we will choose a particular peak shape, motivatedby the kinematics of TOF-MS instruments.

Before we go to detailed calculations for Gaussian and Poisson noisecases, let us summarize that, in order to compare model M₀ v.s M₁, weneed to evaluate the ratio (10), in which we need to compute (14) bymarginalizing (9) over prior distribution p(α,λ|M_(k),t₀)). Parametersare fitted by maximizing the likelihood function (9), via solvingequation (16).

For convenience, from now on, except for t₀, when we refer to parametersassociated with model M₀, a subscript 0 will be used, and a subscript 1will be used for parameters associated with model M₁. For both models, asuperscript of star * will be used for best estimations of parameters.

2.1 Some Preliminary Results for Gaussian Noise Models

2.1.1 Model Comparison

We assume the noise process is Gaussian and white:

$\begin{matrix}{{\left\langle {\eta\left( t_{j} \right)} \right\rangle = \mu},{\left\langle {{\eta\left( t_{j} \right)}{\eta\left( t_{k} \right)}} \right\rangle = \delta_{jk}},{{p_{1}\left( {\eta\left( t_{k} \right)} \right)} = {\frac{1}{\sqrt{2\pi}\sigma_{\eta}}{\exp\left( {- \frac{\left( {{\eta\left( t_{k} \right)} - \mu} \right)^{2}}{2\sigma_{\eta}^{2}}} \right)}}}} & (23)\end{matrix}$wherein the subscript on the probability density indicates that this isthe PDF for a single time step. We use this single-step PDF to computethe joint probability that we would have observed the N-step noisesequence η(t_(k)), k=1, 2 . . . N.

We want to compare the models having a peak in the window (M₁) to thoselacking a peak in the window (M₀). Therefore, we need to evaluate (14)for both M₁ and M₀.

We first assume there is no peak present, only noise, and that each timestep is independent of the others:

$\begin{matrix}{{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\; 0},M_{0}} \right)} = {\prod\limits_{j = 1}^{N}{\frac{1}{\sqrt{2\pi}\sigma_{\eta\; 0}}{{\exp\left( {{- \frac{1}{2\sigma_{\eta\; 0}^{2}}}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}} \right)}.}}}} & (24)\end{matrix}$The notation η in the argument of the likelihood function represents anN-dimensional vector of noise values. Collecting terms, we find

$\begin{matrix}{{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\; 0},M_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\; 0}^{N}}{\exp\left( {{- \frac{1}{2\sigma_{\eta\; 0}^{2}}}{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}}} \right)}}} & (25)\end{matrix}$The exponent is a quadratic function of μ₀, therefore we complete thesquare and write it as

$\begin{matrix}{{{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}} = {{N\left( {\mu_{0} - \mu_{*}} \right)}^{2} + {N\;\sigma_{*}^{2}}}}{where}} & (26) \\{{\mu_{*} = {\frac{1}{N}{\sum\limits_{j = 1}^{N}\eta_{j}}}},\mspace{14mu}{{{and}\mspace{14mu}\sigma_{*}^{2}} = {\frac{1}{N}{\sum\limits_{j = 1}^{N}\left( {\eta_{j} - \mu_{*}} \right)^{2}}}}} & (27)\end{matrix}$The likelihood function now becomes:

$\begin{matrix}{{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\; 0},M_{0}} \right)} = {\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\; 0}^{N}}{\exp\left( {- \frac{N\;\sigma_{*}^{2}}{2\sigma_{\eta\; 0}^{2}}} \right)}{\exp\left( {- \frac{{N\left( {\mu_{0} - \mu_{*}} \right)}^{2}}{2\sigma_{\eta\; 0}^{2}}} \right)}}} & (28)\end{matrix}$From this form it is clear that the likelihood is maximized if we chooseμ₀*=μ* and σ_(η0)*=σ*, with maximum:

$\begin{matrix}{{{p_{N}^{*}(\eta)} \equiv {p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\; 0}^{*},M_{0}} \right)}} = {\frac{1}{\left( {2\pi\; e\;\sigma_{\eta\; 0}^{*2}} \right)^{N/2}}.}} & (29)\end{matrix}$Comparing this value for different window positions can be used tomeasure the likelihood that the signal in each window is a Gaussianwhite noise process. Comparing different models using those parametervalues in each model that maximizse the likelihood may not the mostrobust way to proceed. Instead, it may be preferable to marginalize overall possible values of the model parameters (in this case μ₀ and σ_(η0))to get the probability for the model that there is no peak in the window(M₀) given the signal observed in the window, which is parameter-free.Doing the same marginalization for the case where a peak is assumed inthe window, gives a parameter free probability for model M₁. These twoparameter-free probabilities are directly comparable.

In evaluating equation (14), we now assume that we know nothing aboutthe prior probabilities for the mean and variance of the noise process,aside from the requirement that they be probabilities in the parametersμ₀ and σ_(η0). The simplest choice is to assume that they are constantover some range. Marginalizing, we have:

$\begin{matrix}{{p_{N}\left( \eta \middle| M_{0} \right)} = {\int_{0}^{\infty}\ {\frac{\mathbb{d}\sigma_{\eta\; 0}}{\sigma_{\eta\; 0_{\max}}}{\int_{- \infty}^{\infty}\ {\frac{\mathbb{d}\mu_{0}}{\mu_{0_{\max}}}{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta\; 0},M_{0}} \right)}}}}}} & (30)\end{matrix}$Performing the integral over the parameter μ₀ first we are left with

$\begin{matrix}{{p_{N}\left( \eta \middle| M_{0} \right)} = {\frac{1}{\sigma_{\eta\; 0_{\max}}\mu_{0_{\max}}}\frac{1}{\left( {2\pi} \right)^{{({N - 1})}/2}\sqrt{N}}{\int_{0}^{\infty}\ {\frac{\mathbb{d}\sigma_{\eta\; 0}}{\sigma_{\eta\; 0}^{N - 1}}{\exp\left( {- \frac{N\;\sigma_{\eta\; 0}^{*2}}{2\sigma_{\eta 0}^{2}}} \right)}}}}} & (31)\end{matrix}$Changing variables to

$\begin{matrix}{t = \frac{N\;\sigma_{\eta\; 0}^{*2}}{2\sigma_{\eta\; 0}^{2}}} & (32)\end{matrix}$we find

$\begin{matrix}{{p_{N}\left( \eta \middle| M_{0} \right)} = {\frac{1}{2\sqrt{2}\pi^{{({N - 1})}/2}}\frac{\Gamma\left( {\frac{N}{2} - 1} \right)}{N^{\frac{N - 1}{2}}\sigma_{\eta\; 0}^{{*N} - 2}}\frac{1}{\sigma_{\eta\; 0_{\max}}\mu_{0_{\max}}}}} & (33)\end{matrix}$Using Stirling's asymptotic series for the Γ-function (Γ(x+1)≈√{squareroot over (2πx)}x^(x)e^(−x)(1+ . . . ))

$\begin{matrix}\begin{matrix}{{\Gamma\left( {\frac{N}{2} - 1} \right)} \approx {{{\mathbb{e}}^{{- \frac{N}{2}} + 1}\left( {\frac{N}{2} - 1} \right)}^{\frac{N}{2} - 1 - \frac{1}{2}}\sqrt{2\pi}}} \\{\approx {{{\mathbb{e}}^{{- \frac{N}{2}} + 1}\left( \frac{N}{2} \right)}^{\frac{N - 3}{2}}\sqrt{2\pi}}}\end{matrix} & (34)\end{matrix}$we find:

$\begin{matrix}{{p_{N}\left( \eta \middle| M_{0} \right)} = {{\frac{1}{\left( {2{\pi\mathbb{e}}\;\sigma_{\eta\; 0}^{*2}} \right)^{N/2}}\frac{\pi\sqrt{2}\sigma_{\eta\; 0}^{*2}}{N}} = {p_{N}^{*}\frac{\pi\sqrt{2}\sigma_{\eta\; 0}^{*2}}{N}\frac{\mathbb{e}}{\sigma_{\max}\mu_{\max}}}}} & (35)\end{matrix}$Notice that this still depends upon the window position. This resultcould have been anticipated by the following argument: the integrationof the likelihood over μ₀ and σ_(η0) will be dominated by the region ofμ₀σ_(η0)-space in the immediate neighborhood of the maximum likelihoodvalues μ₀* and σ_(η0)*. Over most of parameter space the likelihood isvanishingly small (if the data is ‘informative’). The volume of spaceover which the likelihood is not vanishingly small is given by theproduct of the half-widths of the likelihood function in each direction(which are essentially the uncertainties in the parameter estimates).The uncertainties can be estimated from the second derivative of the lnof the likelihood function, and are of the order of Δμ₀□σ_(η0)*/√{squareroot over (N)} and Δσ_(η0)□σ_(η0)*/√{square root over (2N)}. Therefore,we have the appealing result that (35) is simply

$\begin{matrix}{{p_{N}\left( \eta \middle| M_{0} \right)} = {2\pi\;{p_{N}^{*}\left( \frac{{\Delta\mu}_{0}}{\mu_{0_{\max}}} \right)}\left( \frac{{\Delta\sigma}_{\eta\; 0}}{\sigma_{\eta\; 0_{\max}}} \right){\mathbb{e}}}} & (36)\end{matrix}$The results (35) and (29) depend only upon the noise model and areindependent of the rest of the model (e.g. peak shapes and positions, ifthere are any). Equation (33), (35) and (36) are different expressionsfor evidence.

The computation for p_(N)(η|M₀) is complete. However, we want to pointout that there is a way that arrives at (35) and (36) while avoidingabove tedious integrals. That is to utilize (22) and (29). The Hessianmatrix, as defined in (19), can be easily calculated in case:

$\begin{matrix}{{\nabla{\nabla{L\left( {\mu_{0}^{*},\sigma_{\eta\; 0}^{*}} \right)}}} = \begin{bmatrix}\frac{N}{\sigma_{\eta\; 0}^{*2}} & 0 \\0 & \frac{2N}{\sigma_{\eta\; 0}^{*2}}\end{bmatrix}} & (37)\end{matrix}$

Substitute (29) and (37) into (22):

$\begin{matrix}\begin{matrix}{{p_{N}^{\prime}\left( \eta \middle| M_{0} \right)} \approx {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\; 0_{\max}}}\frac{2{\pi\sigma}_{\eta\; 0}^{*2}}{\sqrt{2}N}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\; 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\{= {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\; 0_{\max}}}\frac{\sigma_{\eta\; 0}^{*}}{\sqrt{N}}\frac{\sigma_{\eta\; 0}^{*}}{\sqrt{2N}}2\pi\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta\; 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\{= {\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{\eta\; 0_{\max}}}\frac{\sigma_{\eta\; 0}^{*}}{\sqrt{N}}\frac{\sigma_{\eta\; 0}^{*}}{\sqrt{2N}}2\pi\;{p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\; 0}^{*},M_{0}} \right)}}} \\{= {\frac{{\Delta\mu}_{0}}{\mu_{0_{\max}}}\frac{{\Delta\sigma}_{\eta\; 0}}{\sigma_{\eta\; 0_{\max}}}2\pi\;{p_{N}^{*}\left( {\left. \eta \middle| \mu_{0}^{*} \right.,\sigma_{\eta\; 0}^{*},M_{0}} \right)}}}\end{matrix} & (38)\end{matrix}$

Comparing (38) with (36), they differ by a factor e which comes fromStirling series expansion of

$\Gamma\left( {\frac{N}{2} - 1} \right)$in (34).

We now consider there is a peak in the window. Suppose the peak has aline shape described by x_(i), which peaks near the center of thewindow. The observed data within the window may be written as:s(t _(j))=α₁ x(t _(j) −t ₀)+η(t _(j)) j=1 . . . N  (39)where the amplitude (α₁) and arrival time (t₀) of the wavelet, as wellas the noise amplitude σ_(η1) and mean μ₁, are assumed to be unknown.The marginalization in this case looks like:

$\begin{matrix}\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} = {\int{\int{\int{{\mathbb{d}a_{1}}{\mathbb{d}\mu_{1}}{\mathbb{d}\sigma_{\eta\; 1}}{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},t_{0},M_{1}} \right)}}}}}} \\{\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}\end{matrix} & (40)\end{matrix}$where:

$\begin{matrix}\begin{matrix}{{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},t_{0},M_{1}} \right)} = {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}\exp}} \\{\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - u_{1}} \right)^{2}}} \right)}\end{matrix} & (41)\end{matrix}$is the realization of likelihood function (9). As in the case of nopeak, complete the square, let:

$u_{*} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}}} \right)}}$$\sigma_{\eta^{*}}^{2} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}} - u_{*}} \right)^{2}}}$one may get:

$\begin{matrix}\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} = {\int{\int{{\mathbb{d}a_{1}}{\mathbb{d}\mu_{1}}{\mathbb{d}\sigma_{\eta 1}}\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}}}}} \\{{\mathbb{e}}^{{- \frac{1}{2\sigma_{\eta 1}^{2}}}{N{({\mu_{1} - \mu_{*}})}}^{2}}{\mathbb{e}}^{{- \frac{1}{2\;\sigma_{\eta 1}^{\; 2}}}N\;\sigma_{\eta^{*}}^{2}}} \\{\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}} \\{= {\frac{1}{\left( \sqrt{2\pi} \right)^{N - 1}\sqrt{N}}{\int{{\mathbb{d}a_{1}}{\mathbb{d}\sigma_{\eta 1}}\frac{1}{\sigma_{\eta 1}^{N - 1}}{\mathbb{e}}^{{- \frac{1}{2\sigma_{\eta 1}^{2}}}N\;\sigma_{\eta^{*}}^{2}}}}}} \\{\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}} \\{= {\int{{\mathbb{d}a_{1}}\frac{1}{2\sqrt{2}\left( {N\;\pi} \right)^{\frac{N - 1}{2}}\sigma_{\eta^{*}}^{N - 2}}{\Gamma\left( {\frac{N}{2} - 1} \right)}}}} \\{\frac{1}{a_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{1}{\mu_{1_{\max}}}}\end{matrix} & (42)\end{matrix}$Note, if we let (α₁*,μ₁*,σ_(η1)*) be the point where the likelihood ismaximized, the derivation of (α₁*,μ₁*,σ_(η1)*) will be givenmomentarily, then

$\begin{matrix}\begin{matrix}{\sigma_{\eta^{*}}^{2} = {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}x_{i}} - u_{*}} \right)^{2}}}} \\{= {\frac{1}{N}{\sum\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*} - {\left( {a_{1} - a_{1}^{*}} \right)x_{i}} - \left( {u_{*} - u_{1}^{*}} \right)} \right)^{2}}}} \\{= {\sigma_{\eta 1}^{*} + {\sigma_{x}^{2}\left\lbrack {a_{1} - a_{1}^{*} - \frac{\sum\;{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}N}} \right\rbrack}^{2} -}} \\{\frac{\left\lbrack {\sum\;{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}} \right\rbrack^{2}}{\sigma_{x}^{2}N^{2}}} \\{= {{A\left( {a - a_{1}^{\prime}} \right)}^{2} + B}}\end{matrix} & (43)\end{matrix}$where

$\begin{matrix}{{\overset{\_}{x} = \frac{\sum x_{i}}{N}}{\sigma_{x}^{2} = {\frac{1}{N}{\sum\left( {x_{i} - \overset{\_}{x}} \right)^{2}}}}{A = \sigma_{x}^{2}}{B = {\sigma_{\eta 1}^{*2} - \frac{\left\lbrack {\sum\;{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}} \right\rbrack^{2}}{\sigma_{x}^{2}N^{2}}}}{a_{1}^{\prime} = {a_{1}^{*} + \frac{\sum\;{\left( {s_{i} - {a_{1}^{*}x_{i}} - u_{1}^{*}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}N}}}} & (44)\end{matrix}$then marginalization may be written as:

$\begin{matrix}\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} = {\int{{\mathbb{d}a}\frac{1}{2\sqrt{2}{\left( {N\;\pi} \right)^{\frac{N - 1}{2}}\left\lbrack {{A\left( {a - a_{1}^{\prime}} \right)}^{2} + B} \right\rbrack}^{\frac{N - 2}{2}}}}}} \\{{\Gamma\left( {\frac{N}{2} - 1} \right)}\frac{1}{a_{\max}}\frac{1}{\sigma_{\eta^{\max}}}\frac{1}{\mu_{\max}}}\end{matrix} & (45)\end{matrix}$Changing variable t=(α₁−α₁′)² gives:

$\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} = {\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{B^{{- \frac{N}{2}} + \frac{3}{2}}}{\sqrt{A}}{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)}}} & (46)\end{matrix}$Now, having p(η|M₀) and p(η|M₁) computed, one may substitute them into(10) to see which model is more preferable.

Up to this point, the model comparison is complete, as we derived inequations (33) and (46), and equation (10) can be used to compare twohypotheses. Yet, as in parallel with what we did in the M₀ case, we canalso arrive at an equation similar to equation (46) by computing theHessian matrix, but this is going to need (α₁*,μ₁*,σ_(η1)*), the bestestimation of (α₁,μ₁,σ_(η1)).

2.1.2 Parameter Fitting for Gaussian Noise

Once it is determined which model is more appealing, parameters may befitted via maximizing appropriate likelihood.

For the case where there is no peak in the window, the likelihood (9) issimply:

$\begin{matrix}\begin{matrix}{{p_{N}\left( {\left. \eta \middle| \mu_{0} \right.,\sigma_{\eta 0},M_{0}} \right)} = {\frac{1\;}{\left( {2\;\pi} \right)^{N/2}\;\sigma_{\eta 0}^{N}}\;\exp}} \\{\left( {{- \frac{1}{2\;\sigma_{\eta 0}^{2}}}\;{\sum\limits_{j = 1}^{N}\left( {{\eta\left( t_{j} \right)} - \mu_{0}} \right)^{2}}} \right)}\end{matrix} & (47)\end{matrix}$whereη(t _(j))=s(t _(j))  (48)as only noise is assumed in presence. As it is pointed out in previoussection, (47) is maximized at

$\begin{matrix}{{\mu_{0}^{*} = {\frac{1}{N}{\sum s_{i}}}}{\sigma_{\eta 0}^{*} = {\frac{1}{N}{\sum\left( {s_{i} - \mu_{0}} \right)^{2}}}}} & (49)\end{matrix}$

The uncertainties are, as one can see from the Hessian matrix (37):

$\begin{matrix}{{{\Delta\mu}_{0} = \frac{\sigma_{\eta 0}^{*}}{\sqrt{N}}}{{\Delta\sigma}_{\eta 0} = \frac{\sigma_{\eta 0}^{*}}{\sqrt{2N}}}} & (50)\end{matrix}$

If it is preferred that there is a peak in the window, then thelikelihood function (9) looks like:

$\begin{matrix}\begin{matrix}{{p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},M_{1}} \right)} = {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}{\exp\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\eta_{i}^{2}}} \right)}}} \\{= {\frac{1}{\left( {\sqrt{2\pi}\sigma_{\eta 1}} \right)^{N}}\exp}} \\{\left( {{- \frac{1}{2\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - u_{1}} \right)^{2}}} \right)}\end{matrix} & (51)\end{matrix}$the natural logarithm of the likelihood is:

$\begin{matrix}\begin{matrix}{L = {\log\;\left\lbrack {p_{N}\left( {\left. \eta \middle| a_{1} \right.,\mu_{1},\sigma_{\eta 1},M_{1}} \right)} \right\rbrack}} \\{= {{{- \frac{N}{2}}\log\mspace{11mu}\left( {2\pi} \right)} - {N\mspace{11mu}{\log\left( \sigma_{\eta 1} \right)}} - {\frac{1}{2\sigma_{\eta 1}^{2}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)^{2}}}}}\end{matrix} & (52)\end{matrix}$In order to maximize this value, let:

$\begin{matrix}{{\frac{\partial L}{\partial a_{1}} = {{{- \frac{1}{\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)\left( {- x_{i}} \right)}}} = 0}}{\frac{\partial L}{\partial\mu_{1}} = {{{- \frac{1}{\sigma_{\eta 1}^{2}}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)\left( {- 1} \right)}}} = 0}}{\frac{\partial L}{\partial\sigma_{\eta 1}} = {{{- \frac{N}{\sigma_{\eta 1}}} + {\frac{1}{\sigma_{\eta 1}^{3}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}x_{i}} - \mu_{1}} \right)^{2}}}} = 0}}} & (53)\end{matrix}$Solving these equations gives:

$\begin{matrix}{{\begin{matrix}{a_{1}^{*} = \frac{\overset{\_}{({sx})} - {\overset{\_}{s}\overset{\_}{x}}}{x^{2} - {\overset{\_}{x}}^{2}}} \\{= \frac{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)}}{\overset{\_}{\left( {x - \overset{\_}{x}} \right)^{2}}}} \\{= \frac{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)}}{\sigma_{x}^{2}}}\end{matrix}\mu_{1}^{*} = {\overset{\_}{s} - {a_{1}^{*}\overset{\_}{x}}}}{\sigma_{\eta 1}^{*} = {\overset{\_}{\left( {s_{i} - \overset{\_}{s} - {a_{1}\left( {x_{i} - \overset{\_}{x}} \right)}} \right)^{2}} = {\sigma_{s}^{2} - {a_{1}^{2}\sigma_{x}^{2}}}}}} & (54)\end{matrix}$

The log of the likelihood function at (α₁*,μ₁*,σ_(η1)*) is simply:

$\begin{matrix}{{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}} \right)} = {{{- N}\;{\log\left( \sigma_{\eta 1}^{*} \right)}} - {\frac{N}{2}{\log\left( {2\pi} \right)}} - \frac{N}{2}}} & (55)\end{matrix}$

The elements in Hessian matrix ∇∇L(α₁,μ₁,σ_(η1)) evaluated at(α₁*,μ₁*,σ_(η1)*) are:

$\begin{matrix}{{\left. {- \frac{\partial^{2}L}{\partial a_{1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{1}{\sigma_{\eta 1}^{*2}}{\sum\limits_{i = 1}^{N}x_{i}^{2}}} = {\left. {\frac{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}{\sigma_{\eta 1}^{*2}} - \frac{\partial^{2}L}{\partial\mu_{1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{1}{\sigma_{\eta 1}^{*2}}{\sum\limits_{i = 1}^{N}(1)}} = \frac{N}{\sigma_{\eta 1}^{*2}}}}}}{\left. {\begin{matrix}{\left. {- \frac{\partial^{2}L}{\partial\sigma_{\eta 1}^{2}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{- \frac{N}{\sigma_{\eta 1}^{*2}}} + {\frac{3}{\sigma_{\eta 1}^{*4}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)^{2}}}}} \\{= \frac{2N}{\sigma_{\eta 1}^{*2}}}\end{matrix} - \frac{\partial^{2}L}{{\partial a_{1}}{\partial\mu_{1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {\left. {\frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} - \frac{\partial^{2}L}{{\partial\mu_{1}}{\partial\sigma_{\eta 1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{2}{\sigma_{\eta 1}^{*3}}{\sum\limits_{i = 1}^{N}\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)}} = {\left. {0 - \frac{\partial^{2}L}{{\partial a_{1}}{\partial\sigma_{\eta 1}}}} \right|_{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}})} = {{\frac{2}{\sigma_{\eta 1}^{*3}}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)x_{i}}}} = 0}}}}}} & (56)\end{matrix}$

The last 2^(nd) term is zero obviously because μ₁*= s−α₁* x. The lastterm is also zero because it is proportional to the correlation betweenthe residue, which is presumably white noise with zero mean, and thetarget x_(i), which is zero. Nevertheless, it can be shown by following:

$\begin{matrix}\begin{matrix}{{\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \mu_{1}^{*}} \right)x_{i}}} = {\sum\limits_{i = 1}^{N}{\left( {s_{i} - {a_{1}^{*}x_{i}} - \overset{\_}{s} + {a_{1}^{*}\overset{\_}{x}}} \right)x_{i}}}} \\{= {{\sum\limits_{i = 1}^{N}{s_{i}x_{i}}} - {a_{1}^{*}{\sum\limits_{i = 1}^{N}x_{i}^{2}}} -}} \\{{\overset{\_}{s}{\sum\limits_{i = 1}^{N}x_{i}}} + {a_{1}^{*}\overset{\_}{x}{\sum\limits_{i = 1}^{N}x_{i}}}} \\{= {N\left( {\overset{\_}{sx} - {a_{1}^{*}\overset{\_}{x^{2}}} - {\overset{\_}{s}\overset{\_}{x}} + {a_{1}^{*}{\overset{\_}{x}}^{2}}} \right)}} \\{= {N\left( {\overset{\_}{sx} - {\overset{\_}{s}\overset{\_}{x}} - {a_{1}^{*}\left( {\overset{\_}{x^{2}} - {\overset{\_}{x}}^{2}} \right)}} \right)}} \\{= {N\left( {\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)} - {a_{1}^{*}\sigma_{x}^{2}}} \right)}} \\{= 0}\end{matrix} & (57)\end{matrix}$

With all elements of Hessian matrix computed, we can now write out theHessian matrix ∇∇L(α₁*,μ₁*,σ_(η1)*):

$\begin{matrix}{{\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta 1}^{*}} \right)}}} = {\begin{bmatrix}\frac{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}{\sigma_{\eta 1}^{*2}} & \frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} & 0 \\\frac{\sum\limits_{i = 1}^{N}x_{i}}{\sigma_{\eta 1}^{*2}} & \frac{N}{\sigma_{\eta 1}^{*2}} & 0 \\0 & 0 & \frac{2N}{\sigma_{\eta 1}^{*2}}\end{bmatrix}.}} & (58)\end{matrix}$

From which, one can get the uncertainties:

$\begin{matrix}{{{\Delta\; a_{1}} = \frac{\sigma_{\;{\eta 1}}^{*}}{\sqrt{N\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}}}{{\Delta\;\mu_{1}} = \frac{\sigma_{\eta 1}^{*}}{\sqrt{N}}}{{{\Delta\;\sigma_{\eta 1}} = \frac{\sigma_{\eta 1}^{*}}{\sqrt{2N}}},}} & (59)\end{matrix}$

From (22), we have:

$\begin{matrix}\begin{matrix}{{p^{\prime}\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\{\frac{\left( {2\pi} \right)^{3/2}}{\sqrt{\text{det}\;\left( {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\; 1}^{*}} \right)}}} \right)}}{\mathbb{e}}^{L{({a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\; 1}^{*}})}}}\end{matrix} & (60)\end{matrix}$

in which

$\begin{matrix}\begin{matrix}{{\det\left\lbrack {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\; 1}^{*}} \right)}}} \right\rbrack} = {\frac{2{N^{3}\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}}{\sigma_{\eta 1}^{*6}} - \frac{2{N\left( {\sum\limits_{i = 1}^{N}x_{i}} \right)}^{2}}{\sigma_{\eta 1}^{*6}}}} \\{= \frac{{2{N^{3}\left( {\sigma_{x}^{2} + {\overset{\_}{x}}^{2}} \right)}} - {2N^{3}{\overset{\_}{x}}^{2}}}{\sigma_{\eta 1}^{*6}}} \\{= \frac{2N^{3}\sigma_{x}^{2}}{\sigma_{\eta 1}^{*6}}}\end{matrix} & (61)\end{matrix}$

and:

$\begin{matrix}{\frac{1}{\sqrt{\det\left\lbrack {\nabla{\nabla{L\left( {a_{1}^{*},\mu_{1}^{*},\sigma_{\eta\; 1}^{*}} \right)}}} \right\rbrack}}\begin{matrix}{= {\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\;}}}} \\{\propto {\Delta\; a_{1}{\Delta\mu}_{1}{\Delta\sigma}_{\eta 1}}}\end{matrix}} & (62)\end{matrix}$

Substitute (62) into (60):

$\begin{matrix}\begin{matrix}{{p^{\prime}\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\{\frac{\left( {2\pi} \right)^{3/2}\sigma_{\eta 1}^{*3}}{\left( {2N^{3}} \right)^{1/2}\sigma_{x}}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}} \\{= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\;}}}} \\{\left( {2\pi} \right)^{3/2}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}} \\{= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\;}}}} \\{\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}} \\{\propto {\frac{\Delta\; a_{1}}{a_{1_{\max}}}\frac{{\Delta\mu}_{1}}{\mu_{1_{\max}}}\frac{{\Delta\sigma}_{\eta 1}}{\sigma_{{\eta 1}_{\max}}}\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}}}\end{matrix} & (63)\end{matrix}$

To compare with (46), first, we noticed that the 2^(nd) part in B in(46) is actually zero, i.e. B=σ_(η1)*², thus:

$\begin{matrix}\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} = {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\{\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{\sigma_{\eta 1}^{*3}}{\sigma_{x}\sigma_{\eta 1}^{*N}}{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)}}\end{matrix} & (64)\end{matrix}$

Use the Stirling series again,

$\begin{matrix}\begin{matrix}{{\Gamma\left( {\frac{N}{2} - \frac{3}{2}} \right)} \approx {{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( {\frac{N}{2} - \frac{3}{2}} \right)}^{\frac{N}{2} - \frac{3}{2} - \frac{1}{2}}\sqrt{2\pi}}} \\{\approx {{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( \frac{N}{2} \right)}^{\frac{N}{2} - 2}\sqrt{2\pi}}}\end{matrix} & (65)\end{matrix}$

this makes (64) looks like:

$\begin{matrix}\begin{matrix}{{p\left( \eta \middle| M_{1} \right)} \approx {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}}} \\{\frac{1}{4\sqrt{2}\pi^{\frac{N}{2} - 1}N^{\frac{N - 1}{2}}}\frac{\sigma_{\eta 1}^{*3}}{\sigma_{x}\sigma_{\eta 1}^{*N}}{{\mathbb{e}}^{{- \frac{N}{2}} + \frac{3}{2}}\left( \frac{N}{2} \right)}^{\frac{N}{2} - 2}\sqrt{2\pi}} \\{= {\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;\sigma_{x}^{2}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{{2N}\;}}}} \\{\left( {2\pi} \right)^{3/2}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}\frac{{\mathbb{e}}^{3/2}}{2}} \\{\propto {\frac{\Delta\; a_{1}}{a_{1_{\max}}}\frac{{\Delta\mu}_{1}}{\mu_{1_{\max}}}\frac{{\Delta\sigma}_{\eta 1}}{\sigma_{{\eta 1}_{\max}}}\left( {2\pi} \right)^{3/2}{p^{*}\left( {\left. \eta \middle| a_{1}^{*} \right.,\mu_{1}^{*},\sigma_{\eta 1}^{*},M_{1}} \right)}}} \\{\frac{{\mathbb{e}}^{3/2}}{2}}\end{matrix} & (66)\end{matrix}$

This is just like (63) except for the factor

$\frac{{\mathbb{e}}^{3/2}}{2},$which likely arises from the Stirling series.

However, with both (38) and (63) calculated, (10) can be written quiteneatly:

$\begin{matrix}\begin{matrix}{\frac{p\left( H_{1} \middle| n \right)}{p\left( H_{0} \middle| n \right)} = \frac{p\left( {\left. M_{1} \middle| n \right.,t_{0}} \right)}{p\left( {\left. M_{0} \middle| n \right.,t_{0}} \right)}} \\{= \frac{p\left( {\left. \eta \middle| M_{1} \right.,t_{0}} \right)}{p\left( {\left. \eta \middle| M_{0} \right.,t_{0}} \right)}} \\{\approx \frac{\frac{1}{a_{1_{\max}}}\frac{1}{\mu_{1_{\max}}}\frac{1}{\sigma_{{\eta 1}_{\max}}}\frac{\left( {2\pi} \right)^{3/2}\sigma_{\eta 1}^{*3}}{\left( {2N^{3}} \right)^{1/2}\sigma_{x}}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 1}^{*N}}{\mathbb{e}}^{{- N}/2}}{\frac{1}{\mu_{0_{\max}}}\frac{1}{\sigma_{{\eta 0}_{\max}}}\frac{2{\pi\sigma}_{\eta 0}^{*2}}{\sqrt{2}N}\frac{1}{\left( {2\pi} \right)^{N/2}\sigma_{\eta 0}^{*N}}{\mathbb{e}}^{{- N}/2}}} \\{= {\frac{1}{a_{1_{\max}}}\frac{\sigma_{\eta 1}^{*}}{\sqrt{N\;\sigma_{x}^{2}}}\sqrt{2\pi}\left( \frac{\sigma_{\eta 0}^{*}}{\sigma_{\eta 1}^{*}} \right)^{N - 2}}} \\{\propto {\frac{\Delta\; a_{1}}{a_{1_{\max}}}\sqrt{2\pi}\left( \frac{\sigma_{\eta 0}^{*2}}{\sigma_{\eta 1}^{*2}} \right)^{\frac{N - 2}{2}}}}\end{matrix} & (67)\end{matrix}$

It depends on the amplitude of the peak like (α₁*)^(N−2) when the windowis on top of a peak, where σ_(η0)*²=σ_(s) ²˜α₁*²σ_(x) ²+σ_(η1)*², whileσ_(η1)*² is the best estimation of noise variance. It is important tonote that only three calculations must ever be done as the window moves:

$\begin{matrix}{{\overset{\_}{s} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}s_{i}}}}{\sigma_{s}^{2} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\left( {s_{i} - \overset{\_}{s}} \right)^{2}}}}{\overset{\_}{\left( {s - \overset{\_}{s}} \right)\left( {x - \overset{\_}{x}} \right)} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}{\left( {s_{i} - \overset{\_}{s}} \right)\left( {x_{i} - \overset{\_}{x}} \right)}}}}} & (68)\end{matrix}$

2.2 Preliminary Results for Finding Peaks in Poisson Noise

2.2.1 Comparing Models

The basic logic is similar to the case of white Gaussian noise. Thenoise is white because the essence of the Poisson process is that we arecounting discrete events that are uncorrelated. The probability that weobserve n events in the time interval [t,t+Δt] is given by the singlestep Poisson distribution:

$\begin{matrix}{{p_{1}\left( n \middle| r \right)} = {\frac{r^{n}{\mathbb{e}}^{- r}}{n!}.}} & (69)\end{matrix}$where r is the rate. We note that the expectation value of n is<n>=Σnp(n|r)=r. The rate will depend upon the local signal strength. Ifno signal is present (there is only dark current), then the rate will bedenoted r₀. The signal rides on top of the dark current and it isassumed that the local count rate is directly proportional to thesignal:r _(i) ≡r(t _(i))=r ₀ +αx(t _(i))  (70)We can try to estimate both r₀ and a from the same data, or we canestimate r₀ from data taken either from a separate time series, or aregion of the time series that is far from any peak.

The likelihood function is then:

$\begin{matrix}\begin{matrix}{{p\left( {\left. n \middle| a \right.,r_{0},M} \right)} = {\prod\limits_{i = 1}^{N}\;\frac{r_{i}^{n_{i}}{\mathbb{e}}^{- r_{i}}}{n_{i}!}}} \\{= {\prod\limits_{i = 1}^{N}\;\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}{\mathbb{e}}^{- {({{ax}_{i} + r_{0}})}}}{n_{i}!}}} \\{= {{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\;\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}}{n_{i}!}}}}\end{matrix} & (71)\end{matrix}$where it is assumed that x_(i) has unit area:

${\sum\limits_{i = 1}^{N}x_{1}} = 1.$

First, let's consider the case where there is no peak in the window,only the dark current due to electronic fluxion. We may use a priordistribution

$\begin{matrix}{{p\left( {a,\left. r_{0} \middle| M_{0} \right.} \right)} = \frac{\delta(a)}{2r_{\max}}} & (72)\end{matrix}$

Marginalizing the likelihood function over (α,r₀) gives:

$\begin{matrix}\begin{matrix}{{p\left( n \middle| M_{0} \right)} = {\int{\int{{p\left( {\left. n \middle| a \right.,r_{0},M_{0}} \right)}{p\left( {a,\left. r_{0} \middle| M_{0} \right.} \right)}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}} \\{= {\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\;{\frac{\left( {{ax}_{i} + r_{0}} \right)^{ni}}{n_{i}!}\frac{\delta(a)}{2r_{\max}}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}}}} \\{= {\frac{1}{2r_{\max}{\prod\;{n_{i}!}}}{\int_{0}^{r_{\max}}{{\mathbb{e}}^{- {Nr}_{0}}r_{0}^{\sum\limits_{i = t}^{N}n_{i}}\ {\mathbb{d}r_{0}}}}}}\end{matrix} & (73)\end{matrix}$Change variable:

$\begin{matrix}{{{Nr}_{0} = t}{{dr}_{0} = \frac{dt}{N}}{r_{0}^{\sum n_{i}} = \left( \frac{t}{N} \right)^{\sum n_{i}}}} & (74)\end{matrix}$then

$\begin{matrix}\begin{matrix}{{p\left( n \middle| M_{0} \right)} = {\frac{1}{2r_{\max}{\prod\;{n_{i}!}}}{\int{{{\mathbb{e}}^{- t}\left( \frac{t}{N} \right)}^{\sum\limits_{i = t}^{N}n_{i}}\frac{\mathbb{d}t}{N}}}}} \\{= {\frac{1}{2r_{\max}N^{1 + {\sum\limits_{i = 1}^{N}n_{i}}}{\prod\;{n_{i}!}}}{\int{{\mathbb{e}}^{- t}t^{\sum\limits_{i = 1}^{N}n_{i}}{\mathbb{d}t}}}}} \\{= \frac{\left( {\sum\limits_{i = 1}^{N}n_{i}} \right)!}{2r_{\max}N^{1 + {\sum\limits_{i = 1}^{N}n_{i}}}{\prod\;{n_{i}!}}}}\end{matrix} & (75)\end{matrix}$Before moving on to compute the evidence assuming a peak is present inthe window, we note we can do a parameter fitting on r₀:

$\begin{matrix}{{\overset{\Cap}{r}}_{0} = \frac{\sum n_{i}}{N}} & (76)\end{matrix}$

This can now be used in the parameter fitting for the case there is apeak in the window.

We now consider how to detect a peak using a given peak shape. The modelM₁ in this case is that, within the window, there is a peak in thepresence of dark current:

$\begin{matrix}\begin{matrix}{{p\left( n \middle| M_{1} \right)} = {\int{\int{{p\left( {\left. n \middle| a \right.,r_{0},M_{1}} \right)}{p\left( {a,\left. r_{0} \middle| M_{1} \right.} \right)}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}} \\{= {\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}{\prod\limits_{i = 1}^{N}\;{\frac{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}}{n_{i}!}\frac{1}{r_{\max}a_{\max}}{\mathbb{d}a}{\mathbb{d}r_{0}}}}}}}} \\{= {\frac{1}{r_{\max}a_{\max}{\prod\limits_{i = 1}^{N}\;{n_{i}!}}}{\int{\int{{\mathbb{e}}^{- {Nr}_{0}}{\mathbb{e}}^{- a}\prod\limits_{i = 1}^{N}}}}}} \\{\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}{\mathbb{d}a}{\mathbb{d}r_{0}}}\end{matrix} & (77)\end{matrix}$

The integral in (77) is over the parameter plane (α, r₀), the term

$\prod\limits_{i = 1}^{N}\;\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}$is of problematic when the peak is comparable to dark current. However,as we are looking for strong evidences of peaks in the spectrum, and

$\prod\limits_{i = 1}^{N}\;\left( {{ax}_{i} + r_{0}} \right)^{n_{i}}$will be dominated by the peak even αx_(i) is just slightly larger thanr₀ due the power of n_(i), and vice versa. Thus we can neglect a verynarrow region close to the diagonal of ar₀-plane, thereby approximating(77) as:

$\begin{matrix}\begin{matrix}{p\left( {{n\left. M_{1} \right)} \approx {p\left( {{n\left. M_{1}^{\prime} \right)} = {{p\text{(}n\left. {{only}\mspace{14mu} a\mspace{14mu}{peak}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu}{window}} \right)} +}} \right.}} \right.} \\{p\text{(}n\left. {{only}\mspace{14mu}{dark}\mspace{14mu}{current}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu}{window}} \right)} \\{= {{p\text{(}n\left. {{only}\mspace{14mu} a\mspace{14mu}{peak}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu}{window}} \right)} +}} \\{p\left( {n\left. M_{0} \right)} \right.}\end{matrix} & (78)\end{matrix}$where model M₁′ is an approximation of model M₁: there is either onlydark current or only a peak in the window. Hence the ratio in (10)becomes:

$\begin{matrix}{{\frac{p\left( {H_{1}\left. n \right)} \right.}{p\left( {H_{0}\left. n \right)} \right.} \approx \frac{p\left( {n\left. M_{1}^{\prime} \right)} \right.}{p\left( {n\left. M_{0} \right)} \right.}} = {1 + \frac{p\left( {n\left. {{only}\mspace{14mu} a\mspace{14mu}{peak}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu}{window}} \right)} \right.}{p\left( {n\left. M_{0} \right)} \right.}}} & (79)\end{matrix}$where p(n|only a peak in the window) can be computed with the samefashion as p(n|M₀), where only dark current is assumed in the window,but with a different prior:

$\begin{matrix}{{p\text{(}a},{{r_{0}\left. {{only}\mspace{14mu}{peak}} \right)} = \frac{\delta\left( r_{0} \right)}{2a_{\max}}}} & (80)\end{matrix}$This results in:

$\begin{matrix}{{p\text{(}n\left. {{only}\mspace{14mu} a\mspace{14mu}{peak}\mspace{14mu}{in}\mspace{14mu}{the}\mspace{14mu}{window}} \right)} = \frac{\left( {\prod x_{i}^{n_{i}}} \right){\left( {\sum n_{i}} \right)!}}{2{a_{\max}\left( {\prod{n_{i}!}} \right)}}} & (81)\end{matrix}$

We are then ready to evaluate the ratio in (79).

2.2.2 Parameter Fitting

With above ratio (79) computed, if model M₁′ is more preferable, theparameter, i.e. the amplitude, can be estimated by maximizing thelikelihood function (71), using the dark current estimated from the tailof the spectrum where there is no peak. The natural logarithm oflikelihood function (71) is:

$\begin{matrix}\begin{matrix}{L = {\log\;\left\lbrack {p\left( {n\left. {a,{\overset{\Cap}{r}}_{0},M_{1}} \right)} \right\rbrack} \right.}} \\{= {{- {Nr}_{0}} - a + {\sum{n_{i}{\log\left( {{\overset{\Cap}{r}}_{0} + {ax}_{i}} \right)}}} - {\sum{n_{i}\log\;\left( {n_{i}!} \right)}}}}\end{matrix} & (82)\end{matrix}$To maximize it:

$\begin{matrix}{\frac{\partial L}{\partial a} = {{{- 1} + {\sum{n_{i}\frac{x_{i}}{{\overset{\Cap}{r}}_{0} + {ax}_{i}}}}} = 0}} & (83)\end{matrix}$Note αx_(i)>>{circumflex over (r)}₀ if there is a peak, the {circumflexover (r)}₀ in the denominator is then negligible, which results in:

$\begin{matrix}{a_{0} \approx {\sum\limits_{k = 1}^{N}n_{k}}} & (84)\end{matrix}$

This is our best estimate of the amplitude of the peak. It should benoted that this approximation is not good for very small peaks.

3. Derivation of the Line Shape for TOF-SIMS Mass Peaks

The TOF-SIMS is a counting experiment and it (typically) counts thearrival times of one ion at a time. For any given ionization pulse,there will be only a few ions of a given mass that reach the detector.Hence, the eventual pulse shape will be the accumulation of manyindependent experiments, with the origin of time t=0 for each experimentbeing the arrival time of the beam pulse that initiates the ionformation. We consider a single ion of mass M that emerges from atypical beam pulse. After passing through what could be very complicatedion optics that performs charge/mass segregation, at a time t=t₁ (thatis mass-dependent) this ion enters free flight with a velocity ν₀, whichis also mass-dependent. We assume the ion enters free flight at z=0 andtravels a distance L to the detector along a linear trajectoryz(t)=ν₀(t−t₁). We then ask what the PDF of arrival times for this ionwould be if the PDF in the particle phase space at time t₁ isƒ(z,ν*,t=t ₁)=ƒ₀(z,ν)=δ(z)g(ν).  (85)Here, f(z,ν*,t)dzdν is the probability at time t that the particle liesin an infinitesimal neighborhood of the point (z, ν) in phase space andwe allow for an uncertainty in the ions initial velocity, but g(ν) isassumed to be sharply peaked around ν₀. For f to be a PDF we must have∫dνg(ν)=1. This PDF evolves according to the Fokker-Planck equation,which in the present situation is simply

$\begin{matrix}{{{\frac{\partial f}{\partial t} + {v\frac{\partial f}{\partial z}}} = 0},} & (86)\end{matrix}$implying thatƒ(z,ν*,t)=ƒ₀(z−νt,ν)=δ(z−ν(t−t ₁))g(ν).  (87)The PDF of arrival times at the detector placed at z=L is given by theflux of probability crossing this point, which is

$\begin{matrix}{{{P(t)}{\mathbb{d}t}} = {{\int{{\mathbb{d}{v\left( {v{\mathbb{d}t}} \right)}}{f\left( {{z = L},{v;t}} \right)}}} = {\frac{L}{\left( {t - t_{1}} \right)^{2}}{g\left( \frac{L}{t - t_{1}} \right)}{{\mathbb{d}t}.}}}} & (88)\end{matrix}$We suggest the use of a Gaussian for the velocity distribution g(ν) frommaximum entropy arguments: when assigning a PDF for the outcome of whatwill be the single case of a repeated series of identical andindependent experiments, one should choose that PDF which maximizes thenumber of possible outcomes that are consistent with the probabilityassignment. Thus, the choice of a Gaussian here is based on the desireto minimize bias in the outcome, not on physical arguments requiring‘thermalization’ in the ionization process or the nature of the ionoptics.To summarize, our proposed peak (wavelet) shape is

$\begin{matrix}{{x\left( {t\text{;}M} \right)} = {\frac{L}{\left( {t - {t_{1}(M)}} \right)^{2}}{{\exp\left\lbrack {- \frac{\left( {\frac{L}{t - {t_{1}(M)}} - {v_{0}(M)}} \right)^{2}}{2{\sigma^{2}(M)}}} \right\rbrack}.}}} & (89)\end{matrix}$

4. Results on TOF-SIMS

TOF-SIMS is an abbreviation for Time-of-Flight Secondary Ionization MassSpectrometer. It uses a high energy primary ion beam to probe the samplesurface. Poisson noise is associated with this case.

Here we illustrate our results for finding peaks in a spectrum. FIG. 2shows the work flow for a TOF-SIMS mass spectrum (silver foil, galliumbeam). In FIG. 2 a, a segment of data containing only one peak from aspectrum is plotted. A window is then run through these data point bypoint with the ratio of two hypotheses (i.e., equation ((79)) beingcomputed for each window, resulting in the curve shown in FIG. 2 b.Meanwhile, the maximum likelihood estimation of amplitude and themaximized likelihood are also computed. The maximized likelihood isplotted as the curve in FIG. 2 c. Then, a threshold is set (e.g.,athreshold of 20 was used for the illustrated case) for the ratio withthe points above this threshold being marked as dots in FIG. 2 b. Pointsat corresponding positions are also marked in FIG. 2 c. Of those dots inthe likelihood plot, illustrated in FIG. 2 c, only a fraction (i.e., thediamonds) are selected to fit locally to a parabola, from which theposition of the peak is determined. Then the amplitude of the peak ischosen from the estimated amplitude according to the peak position. Thepeak so-determined is overlapped with the original data in FIG. 2 d, butis scaled so that it can be fitted in to the panel.

FIG. 3 depicts a segment of a TOF-SIMS spectrum of angiotensin (silverfoil, gallium beam), with the y-axis corresponding to counts and thex-axis corresponding to time points. The data segments shows fourcontinuous peaks (corresponding to the raw data) arising from isotopes,along with the peaks identified after applying the methods of theinvention.

The peak identification method described herein is potentially useful inany data set wherein peaks of interest are distinguished. The methoddescribed herein is useful for identifying peaks in data sets obtainedfrom spectroscopy, including but not limited to various forms ofabsorption and emission spectroscopy, including absorption spectroscopy,UV/VIS spectroscopy, IR spectroscopy, X-ray spectroscopy, X-raycrystallography, Nuclear Magnetic Resonance spectroscopy (NMR), andraman spectroscopy. In particular, the methods described herein areuseful for applications in mass spectroscopy, including TOF-SIMS andSELDI-MS. For example, the methods described herein are useful foranalyzing TOF-SIMS mass spectroscopic data related to proteomics. Themethods described herein are not limited to spectroscopic applications;for example, the methods are useful when used for imaging applications.

1. An automated method of data processing and evaluation comprising thesteps of: (a) selecting a data set having a plurality of peaks containedtherein; (b) developing a hypothesis which predicts a specified numberof peaks present within a given window of said data set; (c) testing thehypothesis that the specified number of peaks is present within thegiven window; (d) maximizing a likelihood function to estimate theposition and amplitude of said peaks present within the given window;(e) repeating said steps (b)–(d) for each of a plurality of windows witheach of said windows partially overlapping a previous one of saidwindows wherein peak amplitudes and corresponding positions in said dataset are generated using the position and amplitude of said peaks presentwithin each given window so-estimated by each said step of maximizing;and (f) outputting said peak amplitudes and said corresponding positionsto an output device.
 2. The method of claim 1, wherein said step ofoutputting comprises the step of displaying said peak amplitudes andsaid corresponding positions.
 3. The method of claim 2, wherein saidstep of displaying includes the step of overlaying said peak amplitudesand said corresponding positions on a graphic image of said data set. 4.An automated method of data processing and evaluation comprising thesteps of: (a) selecting a data set having a plurality of peaks containedtherein, wherein said data set is obtained from time-of-flight massspectra; (b) developing a hypothesis which predicts a specified numberof peaks present within a given window of said data set; (c) testing thehypothesis that the specified number of peaks is present within thegiven window; (d) maximizing a likelihood function to estimate theposition and amplitude of said peaks present within the given window;(e) repeating said steps (b)–(d) for each of a plurality of windows witheach of said windows partially overlapping a previous one of saidwindows wherein peak amplitudes and corresponding positions in said dataset are generated using the position and amplitude of said peaks presentwithin each given window so-estimated by each said step of maximizing;and (f) outputting said peak amplitudes and said corresponding positionsto an output device.
 5. The method of claim 4, wherein said step ofoutputting comprises the step of displaying said peak amplitudes andsaid corresponding positions.
 6. The method of claim 5, wherein saidstep of displaying includes the step of overlaying said peak amplitudesand said corresponding positions on a graphic image of said data set.